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Abstract 

In this paper, we propose a primal IP method for solving the optimal experimental design 
problem with a large class of smooth convex optimality criteria, including A-, D- andpth mean 
criterion, and establish its global convergence. We also show that the Newton direction can 
be computed efficiently when the size of the moment matrix is small relative to the sample 
size. We compare our IP method with the widely used multiplicative algorithm introduced by 
Silvey et al. [27]. The computational results show that the IP method generally outperforms 
the multiplicative algorithm both in speed and solution quality. 

Key virords: Optimal experimental design, A-criterion, c-criterion, D-criterion, pth mean 
criterion, interior point methods 



1 Introduction 

In this paper, we consider the optimal experimental design problems on a given finite design space 
X = {xi, . . . , Xn} C 3fi™. In this setting, we consider a coefficient matrix K e gfj™^*^ of full column 
rank and the moment matrix defined as 

n 

M{w) = ^WiAi 

i=l 

for w G ri := {w : Wi > — 1}, where Ai is the expected Fisher information matrix 

related to Xi, i — l,...,n. As in [37] . throughout this paper we assume that A^'s are mxm 
real symmetric positive semidefinite matrices and that there exists an w € such that A4{w) is 
positive definite. This in particular implies that M{w) is positive definite for all positive w € fl. 
The optimal experimental design problem can then be formulated as the following minimization 
problem (see [25l Section 7.10]): 

r:=inf $(XH) :=*(Ck(7WH)) 

w (1) 

s.t. w eft, Range(X) C Range(7W(w)), 

where Vl/ is a function defined on the set of positive definite matrices and Ck{-M{w)) is the infor- 
mation matrix defined by Ck{-M{w)) := {K'^ {A4{w))''' K)~^ . Here A'^ denotes the Moore-Penrose 
pseudoinverse of a matrix A. The well-definedness of Ck{.M{w)) is guaranteed by the range inclu- 
sion condition in the constraint of ([T]) and the fact that K has full column rank [321 Chapter 3]. 
The function <& in the objective is commonly referred to as an "optimality criterion" . Some classical 
optimality criteria include (see [25l Chapter 6] ) : 
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(i) A-criterion $(X) := tr(/\^Xti^); 

(ii) c-criterion := c^X'^c; 

(iii) D-criterion $(X) := \ogdet{K'^ X'' K); 

(iv) pth mean criterion <I>(X) tr((ii:^Xtii')-P). 

for some p < 0, c € SR™ and iiT € 5ft™ '^'^ of full column rank. 

It is easy to observe that c-criterion is just a special case of A-criterion with K = c and A- 
criterion is a special case of pth mean criterion with p = —1. We shall also mention that pth mean 
criterion can be defined more generally to include D-criterion as a special case (see [151 Chapter 6] 
for details). Furthermore, it can be shown that the constraint set of ([1]) is convex [551 Section 3.3], 
and the criteria (i)-(iv) are convex functions in the constraint set (by using |25[ Theorem 5.14] and 
[m Theorem 6.13], or [Ml Proposition IV.14] and [Ml Proposition IV.15]). Hence, problem (P 
with these criteria is a convex optimization problem. Indeed, it is known that ([1]) with the above 
criteria can be reformulated as (possibly nonlinear) semidefinite programming (SDP) problems 
(see, for example, [H [3 H HSj) . 

The optimal design problems ([IJ with the aforementioned criteria usually do not have closed 
form solutions. Numerous procedures have thus been proposed to solve ([T]) (see, for example, 
[I1IMI1IS1I351I1II1II1IM1I1IM1II1I33])- Among them, the multiplicative algorithm introduced 
in [27] has been widely explored. For example, Titterington [28], Pazman [Mj, Dette et al. [I2] 
and Harman and Trnovska [19j studied the multiplicative algorithm for D-criterion. In addition, 
Fellman [T5] and Torsney [3T| considered the multiplicative algorithm for A-criterion under the 
assumption that all Ai's are rank-one. Recently, Yu [37] studied the multiplicative algorithm for 
a class of convex optimality criteria and proved its global convergence under some assumptions. 
Nevertheless, for several commonly used optimality criteria, some of those assumptions may not 
hold and hence there is no theoretical guarantee for its convergence. Indeed, as observed in [37l 
Section 5], one of the assumptions does not hold for pth mean criterion with p = —2. Moreover, 
for such a criterion, our numerical experiments in Section [5] demonstrate that the multiplicative 
algorithm appears not to converge when p < — 1. More details about the multiplicative algorithm 
for solving ([T]) are given in Section [S] 

In this paper, we consider an alternative approach to solve problem ([1]). In particular, we study 
interior point (IP) methods for ([1]), which are Newton- type methods and can be efficiently applied 
to a broad class of optimal design problems with moderate-sized matrices A^^s. More specifically, 
we develop a primal IP method for ([1]) with a large class of convex optimality criteria and establish 
its global convergence. We also demonstrate how the Newton direction can be computed efficiently 
when n ^ m^, i.e., when the size of Ai's is small relative to the sample size. We then compare the 
IP method with the multiplicative algorithm. The computational results show that the IP method 
usually outperforms the multiplicative algorithm in both speed and solution quality. 

The rest of this paper is organized as follows. In Subsection ll.ll we introduce the notations that 
are used throughout the paper. In Section [H we review the multiplicative algorithm and address 
its convergence. In Sections [S] we propose a primal IP method for solving problem ([T]) with a 
large class of convex optimality criteria and address its convergence. In Section [H we discuss 
how the IP method can be applied to solve problem ^ with criteria (i)-(iv) and demonstrate 
how the Newton direction can be computed efficiently when n ^ . In Section (5] we conduct 
numerical experiments to test the performance of the method and compare it with the multiplicative 
algorithm. Finally, we present some concluding remarks in Section |6l 

1.1 Notations 

In this paper, the symbol 3?++ denotes the set of all positive real numbers and 5ft" denotes the 
n-dimensional Euclidean space. For a vector x G 5ft" and I C {1, . . . , n}, \\x\\ denotes the Euclidean 
norm of x, xx denotes the subvector of x indexed by X and ^(x) denotes the diagonal matrix whose 
ith diagonal entry is xi for all i. For a G 5ft and a vector a; G 5ft" with positive entries, x" denotes 
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the vector whose ith entry is xf for all i. For x, y G^"^, xoy denotes the Hadamard (entry- wise) 
product of X and y. The letter e denotes the vector of all ones, whose dimension should be clear 
from the context. The set of all m x rt matrices with real entries is denoted by 3?™^". For any 
A G 3?™^", 1 C {1, . . . , m} and J C {1, . . . , n}, Uij denotes the (i, j)th entry of A, Aj denotes 
the submatrix of A comprising the columns of A indexed by J' and Axj denotes the submatrix 
of A comprising the rows and columns of A indexed by I and J^, respectively. The space of n x n 
symmetric matrices will be denoted by 5". If A €! 5" is positive semidefinite (resp., definite), we 
write A )^ (resp., A )~ 0). The cone of positive semidefinite (resp., definite) matrices is denoted 
by 5!f (resp., 5!^;+). For A,B eS"", Ah B (resp., A-^ B) means A- B (resp.. A- B-^0). 
The trace of a real square matrix A is denoted by tr{A). We denote by / the identity matrix, 
whose dimension should be clear from the context. 

A function / : 5" ^> 3? is said to be increasing (resp., decreasing) if for any A ^ B, it holds 
that 

f{A) > f{B) (resp., f{A) < f{B)). 



2 The multiplicative algorithm 

In this section we review the multiplicative algorithm introduced in |27| for solving problem ([T]) 
and discuss its convergence. In particular, we first describe the multiplicative algorithm as follows, 
which is specified through a power parameter A G (0, 1]. 

Multiplicative Algorithm: 

1. Start: Let a positive w'^ ^Vl and A G (0, 1] be given. 

2. For fc = 0, 1,... 



^ = l,...,n, (2) 



where di(w) = -iY{V^[M{w))Ai) and V^[M[w)) is the gradient of $ at M{w). 
End (for) 

Remark 2.1. The above algorithm is the same as the one described in '^37l, in the sense that both 
algorithms generate exactly the same sequence {w'^} provided the initial points w*^ are identical, m 

We now state a global convergence result recently established by Yu [37l Theorem 2] for the 
multiplicative algorithm when applied to solve the following problem, which is closely related to 
©: 

val := sup —^{M{w)) 

w (3) 
s.t. w G f2, M{w) >~ 0. 

Observe that ([IJ and ([3]) are equivalent (i.e., the optimal value being negative of each other) if 
there exists an optimal solution w* of ([1]) with Ai{w*) >- 0, or if $ is convex in 

5™(i^) {X G SI' : Range(i^) C Range(X)} 

and ([3]) has an optimal solution. 

Proposition 2.1. Let {w^} be the sequence generated from the above multiplicative algorithm. 
Suppose the following assumptions hold: 

(a) for any feasible point w of (jS]),, V^{M{w)) ^ and V^{M{w))Ai ^ for i — 1, . . . ,n; 

(b) for any feasible point w of ([3]), ifT(w) ^ w, then '^{M.(T(w))) < $(7W(w)), where 



[T{w)]i := Wi 



Y.l=iWj{dj{w)y 
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(c) $ is strictly convex and V<i> is continuous in S^_^; 

(d) for any {X''} C Sl\, if X* and {^{X^)} is decreasing, then X* >~ 0. 

Then ^{Ai{w'')) — > — val nionotonically, and moreover, any accumulation point of {w^} is an 
optimal solution of 

Remark 2.2. Notice that the assumptions in the above proposition imply that any accumulation 
point w* of {w''} satisfies Ai{w*) >- 0. Hence, if the assumptions in Provosition \2.1\ hold and $ is 
convex in S"''{K), then ([l} is equivalent to (jSj and any accumulation point of the sequence {w''} 
generated from the above multiplicative algorithm solves ([TJ . ■ 

Using Proposition 12.11 and some technical results developed in [37] , one can establish the con- 
vergence of the above multiplicative algorithm when applied to problem ([1]) with A-, D- and pth 
mean criterion for p g (—1,0) and K — I, which is summarized as follows. 

Corollary 2.1. Assume that K — I and Ai ^ for i — 1, . . . , n. Then the multiplicative algorithm 
converges for any A G (0, 1] when applied to problem ^ with D- and pth mean criterion for 
p e (—1, 0). Also, it converges for A-criterion when A £ (0, 1). 

As seen from Proposition 12.11 and Corollary 12.11 the multiplicative algorithm converges for a 
large class of optimality criteria $. Nevertheless, for some important convex optimality criteria, 
the assumptions stated in Proposition l2.1l mav not hold and hence there is no theoretical guarantee 
for its convergence. Indeed, as observed in [371 Section 5], the assumption (b) with A = 1 does not 
hold for pth mean criterion with p = —2. Moreover, for such a criterion, our numerical experiments 
in Section [5] demonstrate that the multiplicative algorithm appears not to converge when p < —1. 

Due to the aforementioned potential drawbacks of the multiplicative algorithm, we will propose 
an IP method for solving problem ([1]) with a broad class of optimality criteria $ including A-, D- 
and pth mean criterion in subsequent sections. 

3 IP method for a class of convex optimality criteria 

In this section, we propose a primal IP method for solving ([1} with a class of convex optimality 
criteria $ = o Ck- We make the following assumption on ^E* throughout this section. 

Assumption 3.1. The function ^ is convex, decreasing, twice continuously dijferentiable and 
bounded below in S™^. Moreover, for any bounded sequences {X'^} C S^_^_ with Ainin(A''^) — > 0, 
one has '^{X^) — > oo. 

Remark 3.1. We now make some brief comments on the above assumptions. 

(a) Assumvtion lS. l\ is fairly reasonable. Indeed, all optimality criteria described in Section\^satisfy 
this assumption. 

(b) Since the feasible set is not necessarily closed, problem ([T]) with a general convex optimality 
criterion may not have an optimal solution. However, when the optimality criterion satisfies 
A ssump tion \3.1\ it must have an optimal solution as shown in Theorem \3 . lY a) . We refer the 
readers to \25i Chapter 5] for more discussion on conditions guaranteeing existence of solutions 
for problem ([TJ. 

(c) In contrast to Proposition \2.1[ we do not require the existence of a positive definite optimal 
moment matrix Ai{w*). Indeed, Assumvtion \3.l\ may hold even when problem ([l} does not 
have a positive definite optimal moment matrix. For instance, the design problem 
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has a unique optimal solution at (wi,W2) = (1,0). The corresponding optimal moment matrix 
is not positive definite; thus, the assumption (d) of Provosition [KT\ does not hold. However, it 
is easy to check that Assumvtion \3. 1\ is satisfied for this design problem (with "^(t) — 1/t). In 
general, the assumption (d) of Proposition [KT\ does not hold when K is not invertible, while 
our Assumption \3.1\ is independent of K. 



Under Assumption lS-H it is not hard to show that the function is bounded below on the 

feasible set of (H]). Also, it is routine to show that the function $ is twice continuously diffcrentiable 
in 5™+. Furthermore, it can be shown that $ is convex in S™{K) by considering suitable Schur 
complements (see, for example, |23l Section 6]). We include a short proof below for the convenience 
of readers. Before proceeding, we state the following well-known fact, which concerns the Schur 
complement of a positive semidefinite submatrix (see, for example, [25, Lemma 3.12]). 

Lemma 3.1. Let A £ S'' , B € sfj^xfe (j ^ Then the matrix ( ^ (7 ) positive 

semidefinite if and only if A> B^C'^B, C ^0 and Range(i?) C Range(C). 
Proposition 3.1. The optimality criterion $ is convex in S"^{K). 

Proof. First of all, it can be shown that the set S^{K) is convex (see, for example, Section 3.3]). 
In addition, notice that for any X E S™{K), we have 

$(X) = *((/^^Xti\:)-i) = inf{*(J7): (K^X^'K)-'^ hU ^ 0} 

= inf I *([/): (^^^' ^^^hO,UyO 
= iui{'^{U) : X h KUK^,U )~ 0} , (4) 

where the second equality follows from the fact that is decreasing, the fourth and last equalities 
follow from Lemma [3TT| while the third equality holds because K^X^K is invertible for X G iS™(if ) 
when K has full column rank. Convexity of $ in S™{K) now follows from [261 Theorem 5.7]. ■ 

Observe that M{w) >- whenever w > Q. Thus, under Assumption 13. 1[ the function $ is twice 
continuously diffcrentiable for any positive w S Jl. It is hence natural to develop an IP method 
to solve ([1]) since such a method keeps all iterates in the relative interior of Vl until convergence. 
To proceed, we first reformulate the problem by eliminating the equality constraint. The resulting 
equivalent problem is given by 

/*=inf f{w):^^{M{Pw + q)) 

w 

s.t. e^w < 1, w>0, (5) 
Range(i<') C Range(7W(Pw + q)), 

where P G gfi"x("-i) and q G SR" are such that 

Pw + q=(^^_y^^ Vri)e5R"-^ (6) 

We next develop an IP method for solving problem ([5]) instead. First, we need to build a 
suitable barrier function. Given any w > satisfying e'^w < 1, one can observe that Pw + <z > 
and hence M{Pw + q) >~ 0, which leads to Range(i^r) C Range(A^ (Pw + q)). This implies that 
any barrier function that takes into account the first two inequality constraints of ([5]) is sufficient 
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for the development of IP method. Here we naturally choose the logarithmic barrier function and 
then solve the barrier subproblem in the form of 

n-l 

min ff,{w) := f{w) - f^Yl log(*0 " log (l " e^*) (7) 

i=l 

for a sequence of parameters fj, I 0. In view of Assumption 13. 1[ we see that any level set of is 
compact. Moreover, is strictly convex. Thus, there exists a unique minimizer to (O for any 
fi > 0. Furthermore, it follows from Assumption 13.11 that is twice continuously differentiable 
and its Hessian is positive definite in its domain. Therefore, problem ([T]) can be suitably solved by 
the Newton's method with a line search whose stepsize is chosen by Armijo rule. 
We are now ready to present our IP method for solving problem ([5]). 

Primal IP Method: 

1. Start: Let a strictly feasible < /?, 7, 77, ct < 1 and /ii > be given. Let e{ii) be an 
increasing function of ^ so that lim^^o = 0. Set w = and fc = 1. 

2. While \\Wf^,{w)\\>e{fik) do 

(a) Compute the Newton direction 

d:^-iV^U,{w))-'VU,iw). (8) 

(b) Let ainax(w) ■= max{a : wla] > 0, e"^w[a] < 1}, where w[a] :— w + ad. 

(c) Let a be the largest element of {«(«)), (3a{w), (3^a{w), ■ ■ ■} satisfying 

< U,{w)+aa{Vf^,{w)fd, 

where a{w) := min{l, ?7a„iax(i&)}- 

(d) Set w 4- w[a]. 

End (while) 

3. Set w'' ^ w, Hk+i *s~ fc fc + 1, and go to step 2. 

In standard convergence analysis of IP methods, the feasible sets are usually assumed to be 
closed and the objective functions are twice continuously differentiable in a neighborhood of the 
feasible sets (see, for example, [12]). Nevertheless, these two conditions do not necessarily hold 
for our problem ([5]). In particular, the objective function is not necessarily continuous up to 
the boundary of the feasible region [551 Section 3.16]. Hence, it is not immediately clear the 
sequence generated by our method will accumulate at a global minimizer of ([S]). Thus, we analyze 
convergence of our IP method below. We first present convergence analysis regarding the outer 
iterations of our IP method and leave the discussion on the convergence of its inner iterations to 
the end of this section. 

For notational convenience, in the remainder of this section, we associate with each w e 
a unique w £ 3?" by letting w := Pw + q. Analogously, we associate with each w € 3?" a unique 
w e 3?""^ by letting Wi = Wi ioT i ^ 1, . . . ,n - I. Also, we let ^m{w) := ^{M{w)). 

We first observe that if problem ([1]) has an optimal solution w* with A4{w*) >- 0, then there 
exists a Lagrange multiplier u* > such that (w*,u*) satisfies the following KKT system: 

T 

e w 
uo w 
{w, u) 



= 0, 

- 1, 

= 0, 

> 0. 



(9) 
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Given a strictly feasible point w £ di" ^ of problem (O, we notice that 

vf^{w)^p^{V'^>M{w)^^lw-^). (10) 

Then it is not hard to observe that for each /i > 0, the w associated with the approximate solution 
w oi ^ obtained by the Newton's method detailed in step 2 above together with u := iiw~^ 
satisfies the following perturbed KKT system: 

P^(V$A4H -u) ^ V, 

e^w = 1, 
uo w = /ie, 
{w,u) > 

for some v € In order to analyze the convergence of our IP method, we will study the 

limiting behavior of the solutions of system pT|) as — > (0+,0), that is, (/x,?;) — )■ (0,0) with 
A* > 0. 

We first claim that system (ITT|) has a unique solution for any (/i,w) G ^++ x Indeed, it 

is easy to observe that (w, u) is a solution of (ITT]) if and only if w G 3?""^ is an optimal solution of 

min ffj,{'w) — v'^w. (12) 



Since the objective function of ((T2|) is strictly convex and it has compact level sets, problem (|12p 
has a unique optimal solution, which immediately implies that system has a unique solution. 
From now on, we denote by {w{pL, v), u{fj,, v)) the unique solution of and by w{fi, v) the vector 
obtained from w{^,v) by dropping the last entry for all {fJ,,v) £ 5?++ x 3?"^^. It is clear that 
w{fj,,v) is the unique optimal solution of (I12p . We next establish the limiting behavior of w{p,v) 
as {fi,v) (0+,0). 

Theorem 3.1. Let {w{^,v),u{p,v)) be defined above for {n,v) £ 3?++ xW^^^. Then the following 
statements hold: 

(a) lim ^{A4{'w{p,v))) = f* and any accumulation point ofw{iJ,,v) as (//, w) (0+,0) is 
an optimal solution of ([T]). 

(b) Suppose that problem ([T]) has an optimal solution w* with Ai{w*) >- 0. Then any accumulation 
point of w{ii,v) as {n,v) (0,0), i.e., {p,v) — > (0,0) with (/i,w) £ {(/i,w) : H'^lloo < 

Cfi} for some given C > 0, is an optimal solution of ([T]) with maximum cardinality. 

(c) Suppose that problem ^ has an optimal solution w* with Ai{w*) >- 0. Letu* be the associated 
Lagrange multiplier satisfying ©• Assume that \B\ + \N\ = n, where B := {i : w* > 0} and 
N := {i : u* > 0}. Suppose further that [V^<l'jVi(u))]gg >- for any w £ ft satisfying wis > 
and wj^ — 0. Then w{fj,,v) converges to an optimal solution of ([T} with maximum cardinality 
as {n,v) (0+,0). 

Proof. We first prove part (a). Let 

/* := inf{$^(w) : e^w = l,w> 0}. (13) 

w 

We first show that lim ^m(w(ii,v)) = f*. 

(m,«)^(o+,o) 

Given an arbitrary e > 0, there exists a positive w satisfying e'^w < 1 such that f{w) < /*+e/2. 
Recall that wdijv) is the unique optimal solution of (HH). Then we have that for any v £ 3?""^, 

f^{w{fi, v)) - v'^wifi, v) < ff,{w) - v^w. (14) 
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On the other hand, note that w{fi,v) > and e^w(/i,w) < 1. Hence, 

n-l 

- log(^i(/^' ")) ~ log ~ e'^w{n, v)) > 

i=l 

and f{w{p,v)) > f*. In view of these inequaUties, (IT4l) and the fact that < 1 and 

||w||i < 1, one can obtam that for any £ 3?++ x 3?""-'^, 

/* < f{w(p,v)) = f^{w{n,v)) + ^^Ylog{w^{^l,v)) + filog (1 - e^w{n,v)) 

n-l 
n-l 

< /* + ^ - M ^ log(wO - /ilog (1 - e^w) + 2\\v\\^. 

i=l 

Thus, there exists some S > such that /* < f{w{fi,v)) < f* + e whenever < S, ^ > 0. 

Hence, <i>M{w{fi,v)) = f{w{n_,v)) f* as {fi,v) ^ (0+,0). 

We next show that f* — f*. Clearly, f* < f*. We now suppose for contradiction that /* < /*. 
By the definitions of /* and /*, there exist and which are feasible points of ([T]) and (fT3|) . 
respectively, so that ^m{w'^) < (/* + /*)/2 and ^Miw"^) < f* + if*-f*)/'2- Let w = {w'^+w'^)/2. 
Clearly, w > 0, e^w = 1 and Range(iir) C Range(A^(w)) due to M.{w) >~ 0. By convexity of $ in 
S'^{K), we obtain that <^m{w) < {^Mi^^) + *7m(w'^))/2 < /*, which is a contradiction to the 
definition of /*. Thus, lim ^M{w(fJ,,v)) = f* = f* ■ 

(m,i')^(0+,0) 

Now suppose that w* is an accumulation point of w{^,v) as (/i, w) — > (0+,0). We next show 
that w* is an optimal solution of ([IJ. Indeed, it follows from (jH) that for any feasible point w of 

^m{w) = inf : M{w) > KUK^, U ^ O} . (15) 

In view of (fTSj) . for each (/i, u) G x W"^^, there exists v) )^ such that 

+ IKM^t')!! > and M{w{fi,v))hKU{n,v)K'^. (16) 

From the second relation in (fT6|). we see that tr(7M(w(/i, «))) > Amin(-f'^^^''^)tr(?7(/i, v)), from which 
it follows that U(fi,v) is bounded and thus it has an accumulation point as {p,v) — > (0+,0). Let 
U* be such an accumulation point. In view of the first relation in (jl6p and the assumption on VP, 
we see that U* >- 0. Moreover, we obtain by taking limit in (fT6|) upon {fi,v) (0+,0) that 

lim $>((u;(Ai,w)) > *(?/*), M{w*) h KU*K'^. (17) 

(ai,ii)->(0+,0) 

The second relation in (IT71) together with Lemma [5TT] implies that 

Miw*) y KU*K^ ^ i _^^,JbO ^ Range(i^) C Range(7U(w*)). 

Hence, w* is a feasible point of ([1]). In view of ([T5|) . the first relation in p7)) and the result 
lim ^Miw(n,v)) = /*, we have 

$^(w;*) < *(t/*) < lim ^Miw{^l,v))=r. 

(P.d)^(0+,0) 

Thus, w* is an optimal solution of ([Ij. This proves part (a). 
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We now show that part (b) holds. Let w* be an optimal solution of ([Ij with maximum 
cardinality. Then it follows immediately from assumption that A4{w*) >- 0. Thus, there exists a 
corresponding Lagrange multiplier u* so that {w* , u*) satisfies ([9]). Let w* be the vector obtained 
from w* by dropping the last entry. In view of ^ and the first equation of ^ and ([Tl]) . we 
observe that for any {n,v) S Sc, 

{w{^,v) - w*f{u{n,v) - u*) 
= {Pw{p,, v) — Pw*)'^ (u(fi, v) — u*) 

= {w{fJ.,v) - w*)'^P'^{V<^>M{w{ti,v)) - V<I>A4(w*)) - iw{p,v) - w*Yv 
> -2Cti, 

where the last inequality holds since $ is convex in 'w{fi,v),w* E fl and ||f||oo < C^. Using 

this inequality and the third equation in ([9]) and (|lip. we see that 

w*'^u{n,v) + w{fi,vfu* < w*'^u* + w{fi,vfu{fi,v) + 2Cfi = {2C + n)fi. (18) 

Dividing both sides of the above inequality by and using the third equation of (fTTj) . we obtain 
that 

n ^ n ^ 

E^^ + E^^ ^ 2C + n. (19) 
Since {w*,u*) > and (w(/i, w), u(^, w)) > 0, it follows from (|T9l) that for all i, 

It immediately implies that the ith entry of any accumulation point of u)(/i, v) as (/i, w) ?■ (0, 0) 

must be positive whenever w* > 0. Since is an optimal solution of ^ by part (a), we conclude 
that part (b) holds. 

Finally, we show that part (c) holds. By assumption, {w* , u*) is a solution of ^ with jSI + IA/"! = 
n. Notice from the third equation of (0) that B r\ N — 0. Thus, B and M form a partition for 
{1, . . . , n\. We first show that when (/x, v) G and /i is sufficiently small, 

WAf{f^,v) 0{fi), Wi{^i,v) = e{l), i e B, ^^^^ 
utsip, = 0(/i), Ui{fj., v) = 6(1), i e TV. 

Since > and u)^ > 0, it follows from (fTSi) that wj^[iJi,v) — 0{fi) and ug(^, w) = 0(/i) for all 
(z^,!") e S(7. In addition, in view of (|20| and the fact that w{p,,v) G il, one can immediately see 
that Wi{fi,v) — 9(1) for alH G S and (/x,w) e Sp. We now show that when {fj,,v) G and jj, is 
sufficiently small, Ui(/i, u) = 6(1) for all i G A/". Indeed, using the first equation of pT|) . we obtain 
that 

P^V$^(u;(Ai,v)) - {P^)b mb(m,«) - (^^)^ u^{p,v) = w. 

Since G fl, we know that \B\ > 1, which together with the definition of P implies that {P'^)j\f 
has full column rank. It then follows from the above equation that 

u^{p,v) = ([(P^)a^]^(P^)aa)"' [{P^Uf {P^V^M{w{^l,v)) ~ {P^)b - v) . (22) 

Recall from above that Wi{ii,v) — 0(1) for all i G S and (/i,v) G Sp. Using this result and the 
fact that w(/i, w) G ft and M{w*) >- 0, it is not hard to see that {A4{w{fi,v)) : (/i,w) G Sc} 
is included in a compact set in Hence, P^V$x(w(/i, w)) is bounded for all (/i, w) G Sc- 
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Further, in view of (PU]) . ([^ and the result that usin, v) = 0{^) for all {fi, v) e 5c, we easily see 
that Ui{n,v) = 0(1) for alH e A/" when [ji^v) G Sc and is sufficiently small. 
For all (a*, v) e x 3*"-!, let 

w+(/x,'y) := ^u;b(/x,'(;), iwAA(Ai,w)^ , u+{n,v) := Qub(^, w), ma/-(^, w)^ , (23) 

^ (J ;) , ^ ("/ ; 

Then it follows from (|lip that (i(;+(/i, u), u+(/i, w)) is the unique solution of 

^P^(V$m(/i(a*)?«) - /2(A*)^^) - 
F{w,u,n,v) := \ e^(/i(/i)w) - 1 | = 0, (w, u) > (24) 

u o It; — e 

for aU {fi,v) e 3?++ x 3?"^^. In view of (PT|). we know that when (/^, w) € Sc and ^ is sufficiently 
small, 

w+^i^i,v) = 0(1), zi;+(At,^') = 6(1), z e 

Thus, (w^(/i, ?j), M+(/i, u)) has accumulation points as {^J.,v) (0,0). Let (w^,u*) be one such 

accumulation point. Clearly, {w'^, u*) > due to the third equation and the inequality in (1241) . We 
will show below that (w"'"(/i, w), u+(^, v)) — )• (w*, m*) as (^, v) — )■ (0+, 0). 

First, it is easy to see that (w*,m^,0,0) satisfies Since M{w*) >- and > 0, it is 

not hard to verify A4{Ii(Q)w^) >- 0. Using this result and Assumption 13.11 we observe that F 
is continuously differentiable in a neighborhood of (w*,m^,0,0). The Jacobian matrix of F with 
respect to {w,u) at (w*,m^,0, 0) is given by 

^P^V2$^(/i(0)z«-)/i(0) -P^/2(0)^ 
J := I e^/i(0) 

We next show that the Jacobian matrix J is nonsingular. It suffices to show that the linear system 
J ^^^^ = 0, or equivalently, 

P^V2$^(/i(0)u)*)/i(0)Au) - P^/2(0)Aw = 0, 

e^(/i(0)Au;) =0, (25) 

o Aw + o Au = 0, 

has only zero solution. Indeed, we observe that the null space of is spanned by e. It then 
follows from the first equation of that 

V^$^(/i(0)w'')/i(0)Aw - /2(0)Aw = Ae, (26) 

for some A G 9?. Multiplying both sides of by (/i(0)At«)-'" and making use of the second 
equation of (P5|) and the fact /i(0)/2(0) = 0, we arrive at 

Au;'^/i(0)V2$A4(/i(0)w*)/i(0)Au; = 0, 

which is equivalent to 

AwI[VHm{Ii{0)w^)]bb^wb = 0. 

This together with the assumption implies that Awb — 0. Using this result, the third equation of 
?5|) and the fact -w* > 0, we see that Aug = 0. Substituting Awb — into we obtain that 
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— /2(0)Ait = Ae, which together with the definition of /2(0) and \B\ > 1 imphes A = and hence 
Au^f = 0. Using this result, the third equation of (|25)) and the fact > 0, we see that Awj^f = 0. 
Thus, we have shown that Aw = Au = 0. Hence, the Jacobian matrix J is nonsingular. 

By applying the implicit function theorem to ([24]), we conclude that there exists e > 0, a neigh- 
borhood U oiO and a continuously differentiable function (w^ {fj,, v) , {fi, v)) defined on (— e, e) x W 
such that 

F{wi{fi,v),ui{p,v),fi,v) = 0, {w^{fi,v),uHp,v)) > \/{p,v) e {-e,e) xU, 
lim {w^ (p,v),u^{iJ,,v)) — {w^ ,u^). 

(Ai,'u)^0 

Recall that system has a unique solution u), w)) for all G x 3ff"^^. 

Therefore, w), w)) = {w^{i.i,v),u^{ij,,v)) for all (/i, w) £ (0, e) xU. It then follows that 

lim w), u)) = (w*, u*). 

{tJ.,v)^{0+,0) 

Using this equality and (1^^ . we finally conclude that 

lim wb(u,v) — lim wtiu.v) = wt > 0, 
lim w\r(n,v) — lim nwtriu.v) = 0. 

From part (b), we further see that (u)g,0) is an optimal solution of ([l} with maximum cardinality. 
This proves part (c) of the theorem. ■ 



As an immediate consequence of Theorem 13. 11 we now state a global convergence result regard- 
ing the outer iterations of our IP method. 

Corollary 3.1. Let {/i/c} and {w'^} be the sequences generated in the primal IP method. Let 
— Pw^ + q for all k. Then the following statements hold: 

(a) lim $(A^(u''^')) — f* and any accumulation point of {w'^} is an optimal solution of ([1]). 

(b) Suppose that problem ([T]) has an optimal solution w* with Ai{w*) >- and e{fJ-k) — 0{^k)- 
Then any accumulation point of {w^^ is an optimal solution of ([l} with maximum cardinality. 

(c) Suppose that problem ([T]) has an optimal solution w* with M(w*) >- 0. Letu* be the associated 
Lagrange multiplier satisfying Q. Assume that \B\ + \M\ = n, where B :— {i : w* > 0} and 
Af := {i : u* > 0}. Suppose further that [V^$A4('"^)]6e ^ for any w G il satisfying wjg > 
and — 0. Then {w''} converges to an optimal solution of ([T]) with maximum cardinality. 

Proof. Let v'' — V/^j.(w*'') and u'' = iik{w^)^^ for all k. In view of (ITU|) and (dJ), we can observe 
that [w^ ^u^ , jikiV^) satisfies dTTI) . By virtue of the definition of {w{fj,,v),u{ii,v)), we then have 
(u/'^u^) = w*"'), u(/ife, u'^)), which together with the fact ~> and Theorem 13.11 implies 

the conclusion holds. ■ 
Before ending this section, we establish a convergence result regarding the inner iterations of 
our IP method. 

Proposition 3.2. Let /i/j > and e{fik) > be given. Then the Newton's method detailed in step 
2 of the primal IP method starting from any strictly feasible point it)'"" of ([5|) generates a point 
satisfying || V/^^. < e(/ifc) within a finite number of iterations. 

Proof. First, observe that all iterates generated by the Newton's method lie in the compact level set 
T := {w : ffn,{w) < //ifc ('fi''"")}- Furthermore, it holds that w > and 1 — e'^w > for all w ^ T. 
This together with the assumption that M{n) n ^ ^ imphes that A^(T) C Sl\. Thus V/^^ 
and V^/^j. are continuous in T. Using this observation and the strong convexity of /^j. in T, there 
exist A, A > such that XI ^ V^/^^ {w) ^ XI for all w gT. This relation along with the continuity 
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of V/^^, and ^^f^j.^ implies that d = —iy'^ f f fi^{'w) is continuous in T. In view of this 
result and the definition of a(w), it is not hard to show that a{w) is positive and continuous in 
T. This fact together with the compactness of T yields a :— inf{a(w) : w e T} > 0. Thus, all 
iterates w generated by the Newton's method satisfy XI < V^/^fc(?Zi) < XI and a{'w) £ [a, 1]. The 
remaining proof follows the same arguments as in the proof of [221 Theorem 3.13]. ■ 

4 IP method for classical optimality criteria 

In this section, we discuss how to apply our IP method to solve problem ([T]) with A-, D- and pth 
mean criterion. In particular, we will demonstrate how the Newton direction ([8]) can be efficiently 
computed for each criterion. 

Before proceeding, we introduce some notations that will be used in this section (see, for 
example, [29] for more details). Given matrices A and B in 9?™^", A(E) B denotes the Kronecker 
product of A and B, while A o B denotes the Hadamard (entry-wise) product of A and B. In 
addition, vec{A) denotes the column vector formed by stacking columns of A one by one. For any 
m X m symmetric matrix U, we define the vector svec(J7) G s)fjm(m+i)/2 g^g 

SVec(C/) = (Uii, V2W21, . . . , \/2u„i, U22, V2U32, . . . , V2w„2, ■ • ■ , Umm)'^ ■ 

It is not hard to observe that svec is an isometry between 5™ and s)ff™(m+i)/2 ^^^^ moreover, 

tr{UV) = svec{Uf svec{V) VC/, V eS"". (27) 

We denote the inverse map of svec by smat. Clearly, they are adjoint of each other, namely, 

u'^ svec(F) = tr(smat(u)F) Vu G gfj™(™+i)/2^ "i/ (= 5™. 

The symmetric Kronecker product of any two (not necessarily symmetric) matrices G, H G 
sjfjmxm ^ square matrix of order m{m + l)/2 such that 

{G®sH)s^fec{U) ^]^swec{GUH'^ + HUG'^) Vt/ G 5". (28) 

As mentioned in [29j, G^sH can be expressed in terms of the standard Kronecker product of G 
and H as follows: 

Gt^sH = ^Q{G <S)H + H(» G)Q^, 

where Q G s)fi™('"+i)/2xm" jg g^^h that 

Qvec(C/) = svec([/), g^svec(C/) = vec([/) Vt/ G 5™. (29) 

It is easy to observe that the above Q exists and is unique. Moreover, QQ^ = I. 

Throughout this section, for each optimality criterion we define the associated function (j) as 
follows: 

(/)(a;) = $(smat(x)) (30) 

for any x G s)fi'"(™+i)/2^ provided that $(smat(a;)) is well-defined. It is clear to observe that (p is 
convex due to the convexity of Define 

M := [ svec(^i) . . . svec(A„)]. 

Clearly, M G sft™(™+i)/2xn. 

With the notations above, the function defined in ([7|) can be rewritten as 

n-l 

f^{w) = (j){M {Pw + q)) - /i X! ^og{m) - log (1 - e^w) . 

i=l 
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By the chain rule, the gradient and Hessian of are given by 



V/^(w) = P'^M'^\/(f){Mw) - nP^w-\ 



(1 - e^w) 



V2/m(w) = P^M^^'^(l){Mw)MP + ^^TT-r^ee"^ + fi^iw'^), (31) 

where w = Pw + q. 

The main computational effort of our IP method lies in computing the Newton direction d 
by solving the system S/^f^{w)d = — V/^(w) (see (|8])). In applications, n can be significantly 
larger than m^. Since the rank of 'V'^(j>{Mw) is at most m{m + l)/2, the first matrix in ((3T|) has 
"low" rank compared to V'^ ffj.{w). It is generally more efficient to compute the Newton direction 
via the Sherman-Morrison- Woodbury formula. To this end, suppose that V^</)(Mw) has rank r. 
Let VDV^ be the partial eigenvalue decomposition of V'^(j>{Mw), where D is the r x r diagonal 
matrix whose diagonal consists of r largest eigenvalues of V^(j){Mw), and the columns of V are the 
corresponding eigenvectors0 Due to the convexity of cf), one can observe that V^0(Mw) = VDV^ . 
It then follows from pTjl that 



(1 — wy 



which together with the Sherman-Morrison- Woodbury formula yields the Newton direction 



V/p(zi)), 



where 




I fV^MP 



-1 



&{w^){P^MTV e) 



When n ^ m , the above approach is much more efficient than solving the Newton system directly 
by performing Cholesky factorization of V^ff^^w). We remark that the ideas of using Sherman- 
Morrison- Woodbury formula to solve specially structured Newton systems have been explored in 
literature (see, for example, dHH]). 

As seen from above, V(/)(Mw) and V'^(j){Mw) are needed to compute Newton direction. For 
the rest of this section, we will discuss how to evaluate these two quantities for A-, D- and pth 
mean criterion, and determine the rank of V^0(Mt«) which is used to perform the aforementioned 
partial eigenvalue decomposition of 'V'^(f){Mw). 

4.1 IP method for pth mean criterion 

Recall from Section [T] that in 5™^ , the pth mean criterion $ becomes 

$(X) = tr{{K'^X-^K)-P) (32) 

for some p < and K e 3?™^'^ with full column rank. It is easy to check that Assumption 13 . 1 I holds 
for Hence, problem ([T]) with this criterion can be suitably solved by our IP method proposed 
in Section [3] 

Based on the above discussion, we know that our IP method needs the gradient and Hessian 
of the associated function cj) for computing Newton direction, where (j) is defined by (|30|) . We next 
discuss how to compute them. Before proceeding, we state the following classical result (see, for 
example, [TI] Proposition 4.3]) that will be used subsequently. 



^The partial eigenvalue decomposition can be efficiently computed by the package PRO PACK |21| . 
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Lemma 4.1. Let 5 : — > 5R &e a differentiable function and let : 5™ — > 5™ be defined by 



g°{Y) := V 



9{d2) 

V 9(.dm)) 



where V&{d)V'^ is an eigenvalue decomposition of Y for some d € 9?™. Then the function is 
well-defined, i.e., it is independent of the choice of V and d, and is also differentiable. Moreover, 
let S^''^ £ 5™ be a symmetric matrix whose {i,j)th entry is given by 



g{di) -g{dj) 
s?f :=! d, - d, " ^ 



if di 7^ dj , 



1 

g'{di) otherwise. 



Then the directional derivative of g^ at Y along the direction H e 5™ is given by 

V{S<^''^o{v'^HV))V'^. 



Proposition 4.1. Let $ be defined in (j32p and the associated (J) be defined in (jSOj). Let Q G 

s](jm(m+i)/2xm defined in (|29|) . Then the gradient and Hessian of (j) at any x £ svec(5™_|_) are 
given by 

W(i){x) =psvec{X-'^K{K'^X-^K)-P-^K^X^^), (33) 
W^(j){x) ^ Q{-p[{X-^KV) (E) {X-^KV)]^^{vec{S<^''^))[{X-^KV) (g) {X-'^KV)f 

~pX-^®G~pG®X-^)Q^, (34) 

respectively, where X = smat(a;), V2!{d)V'^ is an eigenvalue decomposition of K^^X^^K for some 
d e 3?", g{t) = t-P-^, and G = X-^K[K^X-^K]-p-''K^X-^. Ln particular, when K ^ L, the 
above gradient and Hessian reduce to 

Vipix) =pswec{XP-^), (35) 
V20(x) = Q{V ® V)2!{wec{S3''^)){V ® VfQ^, (36) 

where git) — ptP~^ and VSl{d)V^ is an eigenvalue decomposition of X for some d G 3?™. 

Proof. To derive the gradient of 4>, we fix an arbitrary x E svec(5"'i|_). Let X = smat(x). For ah 
sufficiently smaU h £ s)fj'm(m+i)/2^ have X + H y 0, where H ~ smat(/i), and moreover, 

{X + ny^ ^ x^^ - x^^Hx^^ + o{H). (37) 

Using dSZl) and Lemma Hj] with g{t) = t^P and Y = K^X^^K, we obtain that 

$(X + H)^ tT{{K'^[X + H]-^K)-P) = tT{{K^X-^K - K^X-^HX-^K + o{H))-p) 

= $(X) - tT{V{S^''^ o {V^K^X-^HX-^KV))V'^) + o{H), (38) 

where VS'{d)V'^ is an eigenvalue decomposition of Y. Letting R :— —K^X^^HX^^K and using 
the fact that V^V = I and sf^"^ = —pd^P^^ for all i, we further have 



triViS^^' o (V^RVW^) = tr{S^^' o (V^ RV)) = ^ sff ^ v,,r,. 



Vki 



-pJ2 E^j^^r""'^^* "-^^ = -Hp{k^x-^k)-p-^r) 

tTipX-^K{K^X-^KyP~^K^X-^H). (39) 
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In view of the definitions of (/), $, X and H, it follows from ^S^, and ^7} that 

(l){x + h)~ (j){x) = $(X + H)- $(X) ^ {psvec{X-^K{K^X-^K)-P-^K^X-^)) + o{h), 

which yields ([55]) . And (PSj) immediately follows from ([55)) by letting K — I. 

We next derive the Hessian of (/) at any a; S svec(5"':^). To proceed, we first recall the following 
well-known results (see, for example, page 243 and Lemma 4.3.1 of 20J): 

^^ec{ABC) = {C^ ® A)wec{B), {A (g) Bf = (g) . (40) 

Let X, h and H be defined as above. Using ([37)) and Lemma [4.11 with g{t) — t^P^^ and Y = 
K^X-'^K, we have 

V$(X + H) = p{X + H)-^K[K'^{X + H)-'^K]-p-^K'^{X + i/)"^ 

= p{X-^ - X-^HX-^)K[K^{X-^ - X-^HX-^)K]-P-^K^{X-^ - X-^HX-^) + o{H) 

= V4>(x) - p{x-^K)v{s^''^ o {v'^k'^x-^hx-^kv))v'^{k'^x-^) 

-pGHX-^ -pX-^HG + o{H), (41) 

where G is defined as above. Since X is symmetric, it follows from ([40)) that 

wec{{X-^KV){SS^'^ o {V^K'^X-^HX-^KV)){V^K^X-^)) 

^ [{X-^KV) ® {X-^KV)] vec(5S''^ o (V^K^X-^HX-^KV)) 

= [{X-^KV) (g> {X-^KV)]&{vec{SS''^))vec{V^K^X-^HX-^KV)) 

= [{X-^KV) (E) {X~^KV)]&{^rec{S3''^))[{X-'^KV) {X-'^KV)f vec(iJ). (42) 

In addition, since G is symmetric, we further have that 

vec{GHX-^ + X^^HG) = [X^^ ®G + G® X^^] vec(iJ). (43) 

In addition, by virtue of ([^^. ([5(1)) . the definition of X and i/, and the fact that svec is the adjoint 
operator of smat, one can have 

V0(a; + h)- V(t){x) = svec(V$(X + H) - V$(X)) = Q vec(V$(X + iJ) - V$(X)). 

This relation together with ((29)), ((4l])-((43]) and the definition of H yields 

V(l){x + h)- V(t){x) ^ Qi-p[{X-'^KV) (g) {X-^KV)]^{vec{S3''i))[{X^^KV) (g) {X-^KV)f 

-p X-'^ (g G ~ p G (g X-^) vec(iJ) + o(g vec{H)) 

= Q{-p[{X-^KV) (g) (X-iii:V^)]^(vec(S'9'''))[(X-iifV^) g) {X-^KV)f 
-p X-^ g)G-pGg) X-i)Q^ svec(iJ) + o(svec(iJ)) 

= g(-p[(x-iifT/) «) (x-iii:t/)]^(vec(S'f'''))[(x-ii4:i/) (x-^iCF)]'^ 

-p X-^ g)G-pGg) X-^)Q'^h + o{h), 

and hence ([34)) holds. 

For the case when K ~ I, W'^cf) can be directly derived as follows. We know from (1551) that 
V"I>(X) = pX^^-'^. Letting g{t) — ptP^^ and V!^{d)V'^ be an eigenvalue decomposition of X, it 
follows from Lemma Pl.ll that 

V$(X + iJ) = V$(X) + V{S<^''^ o (l/^i?l/))l/^ + o(i/). 

In view of ([40l) . one can see that 

vec(F(59''' o {V'^HV))V'^) ^{Vg)V) vec{S<^''^ o (V^'^i/y)) 

= {Vg) V)[vec{S3''^) o vec(F^i7F)] 

= (F (g V){wec{SS^'^) o [(F ® t/)"^ vec(i7)]) 

= (F (g) F)^(vec(5S'''))(T/ g) F)^ vec(i7). 
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Using these relations and a similar proof as above, we can see that ([M]) holds. ■ 

As mentioned earlier, we need to know the rank of V^^(a;) for performing the partial eigenvalue 
decomposition of V^(/)(a;) which is used to compute Newton direction. In the next proposition, we 
determine the rank of V^0(a;) at any x E svec(5™_^). 

Proposition 4.2. Let $ be defined in (j32p and the associated (j) he defined in (j30p . Then the rank 
ofW'^<j>{x) is m{m + l)/2 — (to — k){m — fc + l)/2 for any x E svec(5™_^). 

Proof. Let x E svec(5™_|_) be arbitrarily chosen. Define X = smat(x). Let G, V, d and S^'"^ be 
defined in Proposition 14. II with git) = t~^~^. For convenience, we define 

Ml = [{X-^KV)®{X-^KV)]S!{^rec{S!l^'^))[{X-^KV)®{X-^KV)]^, 
M2 = X~^ (SG + G®X-\ 

To determine the rank of \7'^(f>{x), it suffices to know the dimension of the null space of V'^(j){x), 
denoted by Null(V^(/)(a;)). Notice that is a twice differentiable convex function in svec(5™^). 
Thus, V'^(f){x) >z 0. It implies that h E Null(V20(x)) if and only if K^W'^(j){x)h = 0. We will 
subsequently show that 

h^\'^<P{x)h = ^ K^X-^H = 0, (44) 
where H = smat(/i). It then follows that 

h E Null(V^0(x)) 4^ K'^X-^H = 0. 

Notice that X~^ has full row rank. Thus, there exist nonsingular matrices Ei and E2 such 
that K^X~^ = El (/ 0) £'2, where / is the identity matrix of order k. It then follows that 

K'^X-^H = ^ (/ 0) J7 = 0, 

where U = E2HE^ e 5™. It is easy to sec that the dimension of {[/ e 5™ : (/ O) U = 0} 
is {m — k){m — k + l)/2. Since E2 is invertible, we conclude that the dimension of {H E 5™ : 
K^X~^H — 0} is also [m — k)(m — k + l)/2. Since smat is a one-to-one map between s)fi'"(™+i)/2 
and 5™, the dimension of Null(V^0(x)) is (m — k){m — fc + l)/2, and hence the rank of V^0(a;) 
is m{m + — ira — k){m — fc + l)/2. To complete the proof, we next show that (|44p holds by 
considering two cases p < — 1 or — 1 < p < 0. 

We start with the first case p < —1. Notice that all entries of S^''^ are nonnegative and thus 
Ml > 0. Also, M2 h 0. It then follows from (jMl) and ^ that h^V'^(l){x)h = if and only if 

vec{H)^Mi vec{H) = 0, wec{H fM2 vec(iJ) = 0. (45) 

By (|43|) . the second equality of (|45|) becomes 

tT{HX-^K{K'^X-^KyP-^K^X-^HX-^) = 0, 

which is equivalent to 

tT{X--2HX-^KiK^X-^K)~'^iK^X-^K)-^K^X-^HX--2) = 

^ {K^X-^K)-'^K^X-^HX--2 =0 ^ K^X-^H = 0. ^^^^ 

Moreover, K^X^^H = implies that the first equality of (|45l) holds. Therefore, (|^5|) holds if and 
only if K^X-^H = 0. It follows that ^ holds for p < -1. 

We next show that also holds for —1 < p < 0. Indeed, for such p, all entries of S^''^ are 
negative and hence —Mi >: 0. Using Proposition 14. 1[ we see that h^V^4>{x)h = if and only if 

vec{Hf{Mi + M2) ^rec{H) = 0. (47) 
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We claim that 

i vec(i7)^Af2 vec(iJ) > - wec{Hf Mi vec(i7). (48) 
Indeed, letting W = {K'^ X^^ K)^^ and using Lemma I5TT1 we have 



W-'^ = K'^X-^K => { ^-1 ) => X)^ KWK'^ 



The latter relation together with the definitions of M2 , G and (|43)) implies that 
i vec(iJ)^M2 vec(i7) = It{HX-^HX-^KWP+^K'^X-'^) 
= ti{[X-^H{X-^KWP+^K^X-^)^^X[X-^H{X-^KWP+^KTX-^)^) 
> tr{[X-^H{X-^KWP+^K'^ X-^)^'^ KWK'^[X-^H{X-^KWP+^K'^ X-^)^) 
= tT{HX-^KWK^X-^HX-^KWP+^K^X-^) 



(49) 



Let Z = V^K'^X-^HX^^KV. Notice that W = VS!{d^^)V^. Using this relation, the definition 
of Z and (HUl) . we have 

tr(iJX-iii:VKi^'^X-iiJX-iii:W^?'+i/^^X-i) = tr(iJX-ii^F^(d-i)Zi^(d-?'-i)T/'^K^X-i) 
= tr(^(d-i)Zi?(d-P-i)Z) = vec(Z)^[^(d-i)® ^((i-P-i)]vec(Z), 

which together with ([^^ yields 

ivec(ff)'^M2vec(i7) > vec(Z)^[^(d^i) «) ^(d^P-i)] vec(Z). (50) 

Also, by the definitions of Mi, Z and PO)) . we obtain that 

- wec{H)^Mi vec(iJ) = vec(Z)^^(vec(-5^'^'^)) vec(Z). (51) 
Since 1 > p + 1 > and o?i > for all i, it is not hard to show that 

^ ^ - d,-d, ' 

whenever d^ ^ dj. Thus, ^(d'^) (g) ^(rf-P-^) ^ ^(vec(-S'f ''')), which together with ^ 
and dH]) implies that (gS]) holds. It then follows from (gS]), (H?]) and the fact M2 h that 
vec(i?)^M2 vec(iJ) = 0. The rest of proof is similar to the case p < —1. ■ 



4.2 IP method for A-criterion 

Recall from Section [1] that in S™^, the A-criterion $ becomes 

<P{X) = tr(A"^X-iii:) (52) 

for some K G 3?™'^'^ with full column rank. Since A-criterion is a special case of pth mean criterion, 
the IP method discussed in Sections [3] and I4?l1 can be suitably applied to solve problem ([Ij with 
A-criterion. We next show that by exploiting the special structure, we can obtain a more compact 
representation of the associated Hessian matrix that is used to compute Newton direction for our 
IP method. 

Proposition 4.3. Let $ be defined in (|52p and the associated (p he defined in pop . Then the 
gradient and Hessian of cj) at any x G svec(5™^_) are given by 

V(t,{x) = -svec(X"^A:A'^X"^), (53) 
V20(x) = 2X-^(^s{X-^KK'^X-^), (54) 

where X = smat(a:). 
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Proof. ([55)1 follows immediately from ([55)1 with p = ~1. We now prove ([5^ . Let a; G svec(iS™^_) 
be arbitrarily chosen, and let X = smat(a;). For all sufficiently small h £ sjf"i{m+i)/2 ^ observe 
X + H y 0, where H = smat(ft,). In view of the definitions of X and H, it then follows from (f53|) . 
([37| and ([281) that 

V0(x + /i) - V0(a;) = -svec((X + i7)-iX/s:^(X + i7)-i -X-i/sTX^X-i) 

= svec{X -^HX-^KK^X-^ + X-^KK^X-^HX-^) + o{svec{H)) 
= 2X-'^(g),{X-'^KK^X-'^)h + oih), 
which proves ((54)) . ■ 

Since the A-criterion is a special case of the pth mean criterion, it follows from Proposition |42] 
that the rank of V^(/)(a;) is also m(m + l)/2 — (m — k){m — fc + l)/2 for every x E svec(5"'i|_). 

4.3 IP method for D-criterion 

Recall from Section [1] that in the D-criterion $ becomes 

= \ogdet{K'^X-^K) (55) 

for some K g 3?™^'^ with full column rank. It is easy to verify that Assumption 13.11 is satisfied. 
Hence, problem ([T]) with this criterion can be suitably solved by the IP method studied in Section|31 
In the next proposition, we provide formulas for computing gradient and Hessian of the associated 
function (j) that are used in the IP method. The proof is similar to that of Proposition 14.31 and is 
thus omitted. 

Proposition 4.4. Let $ be defined in (j55l) and the associated (p he defined in pOp . Then the 
gradient and Hessian of (j) at any x G svec(iS!|"_|_) are given by 

V(t){x) = -swec{X-^KWK'^X-^), 

V'^(j){x) = 2X-^®s{X-^KWK'^X-'^) - {X-^KWK'^X-^)®s{X-^KWK'^X'^), (56) 

where X = smat(a;) and W = {K^X-^K)-\ 

We next determine the rank of V^0(X) at any x G svec(iS™^). 

Proposition 4.5. Let $ be defined in (j55p and the associated (j) be defined in pop . Then the rank 
ofV'^(t){x) is m{m + l)/2 — (m — k){m — fc + l)/2 for any x G svec(5™_^). 

Proof. Let x G svec(iS™^) be arbitrarily chosen. Define X = smat(x). As in the proof of 
Proposition l4.2[ to determine the rank of V^(/)(a;), it suffices to know the dimension of Null(V^(/)(a;)). 
Notice that is a twice differentiable convex function in svec(5™_^). Thus, W'^(j){x) >: 0. It implies 
that h G Null(V2(?!)(a;)) if and only if h^V'^(j){x)h = 0. In view of ^ and it is not hard to 
verify that h^V^4){x)h = if and only if 

2tT{HX-^HX~^KWK'^X-^) - tT{HX-^KWK^X-^HX-^KWK^X-^) = 0, 

where H = smat(ft,). In addition, we can observe that (H^ also holds for p = 0, and hence 

tT{HX~^HX-^KWK'^X-'^) > It{HX-^KWK^X-^HX-^KWK^X-^). 

Furthermore, 

It{HX-^HX-^KWK^X-^) = iT:{[{X-^KWK'^X-^)^H]X-^[H{X-^KWK'^X-^)^]^ > 0. 
The above relations imply that h^V'^(f>{x)h = if and only if 

it{hx-^hx-^kwk'^x-^) = 0, 

which together with definition of W and the same arguments used in implies that 

h'^v^(j){x)h ^ k'^x-^h = 0. 

The rest of the proof follows similarly as that of Proposition ■ 
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5 Computational results 



In this section, we conduct numerical experiments to test the performance of the IP method 
discussed in this paper for solving problem ([T|) with A-, D- and pth mean criterion and also 
compare its performance with the multiplicative algorithm. 

We develop Matlab codes for our IP method to solve ([T]) with A-, D- and pth mean criterion. We 
also implement the multiplicative algorithm in Matlab for solving ([1} with A-, D- and pth mean 
criterion. To benchmark the performance of our IP method, we also report the computational 
results using a general SDP solver, namely, SDPT3 [30l |34] (Version 4.0) on solving a linear SDP 
reformulation of ([T]) with A-criterion (see \]A, Page 532]) and a log-determinant SDP reformulation 
of (H)) with D-criterion (see Equation (9)]). We shall mention that it is not clear whether 
problem ([T|) with pth mean criterion can be reformulated into a problem that can be efficiently 
solved by SDPT3. As SDPT3 implements an infeasible path- following algorithm, we project the 
approximate solution w found by SDPT3 onto the unit simplex to obtain an approximate optimal 
feasible solution for problem ([I} and the final objective value reported in our tests is based on 
the latter solution. All computations in this section are performed in Matlab 7.13.0 (2011b) on a 
workstation with an Intel Xeon E5410 CPU (2.33 GHz) and 8GB RAM running Red Hat Enterprise 
Linux (kernel 2.6.18). 

For our IP method, we set = ie e W'-^, = 10, /3 = 7 = 0.5, a = 0.1 and ri = 0.95. In 
addition, we set e(/i) = max{/i, le — 10} and terminate the algorithm once /i/t < le ~ 10. On the 
other hand, for the multiplicative algorithm, similarly as in |37j . we set X — 1, = —e 3?", and 
terminate the algorithm when it reaches 10000 iterations or 

n 

max di{w'') < (1 + (5) w^^di{w^) 

l<i<n ^ — ^ 
1=1 

holds with (5 = 2e — 4, where di{w) is defined in ([2]). Furthermore, for SDPT3, we use the default 
tolerance. Finally, we use the mex files skron, smat and svcc from the SDPT3 package for efficient 
operations on symmetric matrices in our implementation of the IP method and the multiplicative 
algorithm. 

In our tests below, we consider the following four design spaces: 

Xi{n) = {x^ = (e~*%s,,e-^%e-2«',Sie-2^')^, 1 < i < n}, 
X2{n) {x, = (1, s,, sl,s1)^, l<i<n}, 

Xsin) = {x(^^l)n+J = {l,n,rf,tj,ntj)'^,l < i,j < n}, 

Xiin) = {xi = {U, tf,sm{2TTti),cos{2TTti))'^, I <i <n}, 

where Si = ri = ^ — 1 and ti — ^. The space Xi("-) represents the linearization of a com- 
partmental model [3]. The space X2("-) corresponds to polynomial regression. The third space, as 
described in [38], represents a response surface with a nonlinear effect and an interaction, while 
the fourth space is the quadratic/trigonometric example proposed in |36j . The test sets xi: X3 ^-nd 
a variant of the test set X2 are also used in [38]. 

In our first test, for each design space, we set Ai — Xixf for i = 1, . . . ,n, with n ~ 10000, 
50000, 100000. For each n and each design space, we randomly generate 30 different matrices 
K e 3?™^'^ (i.e., we set fc = 3), each having i.i.d. Gaussian entries of mean and variance 1. 
We then apply our IP method and the multiplicative algorithm to solve problem (jlj with A-, 
D- and pth mean criterion on these instances and also apply SDPT3 to solve (jlJ with A- and 
D-criterion. The computational results averaged over the 30 instances are reported in Tables [T]- 
21 In particular, the performance of our IP method, the multiplicative algorithm and SDPT3 
are reported under the columns named "IP", "MUL" and "SDPT3", respectively. In addition, 
the CPU time abbreviated as "cpu" is in seconds and the objective value abbreviated as "obj" 
is rounded off to six significant digits. We see that our IP method significantly outperforms the 
multiplicative algorithm in terms of CPU time, and gives a smaller objective value in all instances. 
Moreover, our IP method also outperforms SDPT3 in CPU time and gives a smaller objective value 
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Table 1: Computational results for A-criterion with random K 









cpu 






obj 




Xi 


n 


MUL 


IP 


SDPT3 


MUL 


IP 


SDPT3 


1 


10000 


14.30 


0.80 


2.19 


111511 


110530 


113966 


1 


50000 


67.75 


4.05 


10.92 


120146 


119140 


133893 


1 


100000 


142.77 


8.17 


18.76 


187986 


186276 


215851 


2 


10000 


17.41 


0.84 


1.98 


206.959 


203.762 


203.762 


2 


50000 


78.66 


4.20 


10.45 


269.029 


264.054 


264.055 


2 


100000 


177.25 


8.19 


22.26 


256.361 


251.375 


251.391 


3 


100 


41.15 


1.08 


2.30 


42.1423 


42.1203 


42.1203 


3 


200 


139.58 


4.97 


14.24 


54.7132 


54.6797 


54.6879 


3 


300 


351.40 


11.71 


33.84 


54.0963 


54.078 


54.0817 


4 


10000 


13.02 


0.94 


1.81 


560.199 


545.569 


545.569 


4 


50000 


61.18 


4.77 


9.84 


466.661 


455.301 


455.301 


4 


100000 


138.10 


9.19 


20.73 


611.606 


597.149 


597.175 



Table 2; Computational results for D-criterion with random K 









cpu 






obj 




Xi 


n 


MUL 


IP 


SDPT3 


MUL 


IP 


SDPT3 


1 


10000 


1.26 


0.97 


1.46 


19.7352 


19.7347 


19.7356 


1 


50000 


5.13 


4.74 


5.96 


19.9312 


19.9307 


19.933 


1 


100000 


14.20 


8.72 


12.29 


19.7973 


19.7968 


19.7987 


2 


10000 


1.95 


0.77 


1.49 


5.95269 


5.95229 


5.9523 


2 


50000 


20.88 


3.91 


6.29 


5.30436 


5.3039 


5.3039 


2 


100000 


53.35 


7.85 


13.26 


5.08652 


5.08608 


5.08609 


3 


100 


4.18 


1.05 


1.76 


6.58713 


6.58694 


6.58694 


3 


200 


19.01 


4.37 


8.80 


6.65124 


6.65104 


6.65103 


3 


300 


68.40 


9.91 


21.89 


6.74346 


6.74327 


6.74382 


4 


10000 


1.45 


0.96 


1.33 


7.40587 


7.40535 


7.40535 


4 


50000 


13.47 


4.51 


5.96 


7.65401 


7.6535 


7.6535 


4 


100000 


40.19 


8.33 


12.08 


8.66619 


8.66575 


8.66574 



in most instances. Furthermore, it is worth pointing out that SDPT3 early terminates when solving 
some instances for xi with A-criterion, possibly due to bad scaling of M{w). This accounts for its 
significantly larger objective values in Table [T] corresponding to xi. Finally, for pth mean criterion 
with p < — 1, our IP method achieves significantly better objective values than the multiplicative 
algorithm, where the objective value of the latter algorithm is chosen to be the minimum over 
all iterations (see Table U) . This phenomenon is actually not surprising since the multiplicative 
algorithm is only known to converge for p € (—1,0), but it may not converge when p < —1. 

In our second test, we consider the case when K = I. The instances used in this test are the 
same as those in the first test except K = I. We also apply our IP method and the multiplicative 
algorithm to solve problem ([T]) with A-, D- and pth mean criterion on these instances and apply 
SDPT3 to solve ([Ij with A- and D-criterion. The computational results are reported in Tables [5]- 
m We again observe that our IP method outperforms the multiplicative algorithm in terms of 
objective value in all instances, and is generally much faster on large instances. Furthermore, our 
IP method is usually faster than SDPT3 and produces comparable or smaller objective values. 

6 Concluding remarks 

In this paper we propose a primal IP method for solving problem ([l} with a large class of convex 
optimality criteria and establish its global convergence. We demonstrate how the Newton direction 
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Table 3: Computational results for pth mean criterion with random K for some p £ ( — 1, 0) 









P = 


= -0.25 






P = 


-0.75 








cpu 


obj 


cpu 


obj 


Xi 


n 


MUL 


IP 


MUL 


IP 


MUL 


IP 


MUL 


IP 


1 


10000 


6.43 


0.90 


25.4567 


25.4558 


9.22 


0.84 


5187.73 


5187.23 


1 


50000 


38.46 


4.00 


25.1902 


25.1894 


48.09 


3.83 


7128.51 


7126.32 


1 


100000 


87.55 


7.84 


25.1312 


25.1304 


126.19 


7.69 


7207.59 


7205.68 


2 


10000 


6.24 


0.81 


5.68067 


5.68046 


13.76 


0.86 


46.4144 


46.4008 


2 


50000 


28.79 


3.71 


6.00911 


6.00886 


75.97 


4.02 


58.2028 


58.1903 


2 


100000 


74.43 


7.56 


6.12458 


6.12434 


159.49 


7.80 


60.9256 


60.9108 


3 


100 


5.45 


0.97 


5.58387 


5.58379 


3.65 


1.04 


24.8691 


24.868 


3 


200 


19.26 


4.23 


5.56727 


5.56718 


17.44 


4.88 


23.5463 


23.5451 


3 


300 


60.36 


10.64 


5.45907 


5.45899 


58.36 


11.92 


24.7679 


24.7664 


4 


10000 


1.71 


0.94 


7.30484 


7.30456 


2.73 


0.96 


118.871 


118.859 


4 


50000 


8.81 


4.57 


7.27622 


7.27589 


10.92 


4.86 


108.079 


108.066 


4 


100000 


32.17 


9.54 


7.30129 


7.30102 


46.79 


10.30 


128.676 


128.662 



Table 4: Computational results for pth mean criterion with random K for some p < — 1 









P = 


= -1.1 








p = -1.2 








cpu 


ob 


i 


cpu 


obj 


Xi 


n 


mul 


IP 


mul 


IP 


mul 


IP 


mul 


IP 


1 


10000 


4.54 


0.73 


611960 


602294 


4.12 


0.71 


1.46813e+06 


1.43891e+06 


1 


50000 


19.79 


3.45 


541355 


532904 


18.54 


3.43 


2.0649e+06 


2.02777e+06 


1 


100000 


49.28 


7.07 


371942 


365201 


52.07 


6.89 


1.79042e+06 


1.75803e+06 


2 


10000 


5.53 


0.85 


373.376 


359.802 


4.69 


0.84 


650.345 


629.288 


2 


50000 


20.36 


3.95 


492.463 


476.123 


20.16 


4.02 


667.047 


645 


2 


100000 


62.36 


7.92 


302.083 


288.88 


63.05 


7.91 


539.641 


514.087 


3 


100 


18.94 


1.12 


74.7421 


71.4397 


20.20 


1.18 


95.224 


88.142 


3 


200 


72.50 


5.00 


69.2354 


65.8478 


70.19 


5.06 


127.857 


116.933 


3 


300 


210.53 


12.43 


69.2994 


65.8143 


164.64 


12.88 


109.287 


100.475 


4 


10000 


6.16 


0.96 


961.75 


910.571 


6.76 


0.96 


1640.19 


1524.98 


4 


50000 


26.79 


4.88 


903.773 


846.954 


35.94 


4.90 


1631.8 


1520.71 


4 


100000 


77.24 


10.55 


824.269 


776.036 


77.94 


10.54 


1710.28 


1596.1 



Table 5: Computational results for A-criterion with K — I 









cpu 






obj 




X, 


n 


mul 


IP 


SDPT3 


mul 


IP 


SDPT3 


1 


10000 


13.65 


0.81 


2.35 


54286.3 


53848.3 


53848.4 


1 


50000 


65.59 


4.25 


12.39 


54245.2 


53807.3 


54103.8 


1 


100000 


142.17 


6.84 


27.21 


54240.1 


53802.1 


54103.8 


2 


10000 


16.40 


0.79 


1.80 


73.4521 


72.4443 


72.4443 


2 


50000 


78.93 


3.98 


10.40 


73.391 


72.385 


72.3853 


2 


100000 


169.00 


8.29 


19.71 


73.3837 


72.3778 


72.3777 


3 


100 


1.48 


0.95 


2.13 


21.6203 


21.6191 


21.6191 


3 


200 


13.03 


4.31 


11.08 


21.2826 


21.2812 


21.2812 


3 


300 


37.53 


9.84 


30.31 


21.1721 


21.1706 


21.1706 


4 


10000 


12.66 


0.96 


1.55 


174.279 


170.775 


170.775 


4 


50000 


61.20 


4.82 


8.97 


174.276 


170.775 


170.775 


4 


100000 


133.68 


9.48 


16.43 


174.277 


170.775 


170.776 



can be efficiently computed when the method is applied to solve problem Jl} with classical opti- 
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Tabic 6: Computational results for D-criterion with K = I 









cpu 






obj 




Xi 


n 


mul 


IP 


SDPT3 


mul 


IP 


SDPT3 


1 


10000 


0.92 


1.05 


0.86 


20.5125 


20.5119 


20.5125 


1 


50000 


4.52 


5.19 


3.81 


20.5098 


20.5091 


20.5091 


1 


100000 


14.89 


9.77 


7.56 


20.5094 


20.5087 


20.5088 


2 


10000 


1.78 


0.81 


1.07 


0.410745 


0.410221 


0.41022 


2 


50000 


16.99 


4.37 


5.06 


0.409964 


0.409267 


0.40926 


2 


100000 


57.42 


8.65 


9.05 


0.409795 


0.409154 


0.409145 


3 


100 


1.64 


0.95 


1.17 


5.14292 


5.14267 


5.14267 


3 


200 


16.11 


4.19 


6.26 


5.08236 


5.08212 


5.08211 


3 


300 


47.57 


8.78 


16.43 


5.06226 


5.06202 


5.06201 


4 


10000 


1.12 


0.95 


0.93 


7.25257 


7.25189 


7.25189 


4 


50000 


10.98 


5.13 


4.14 


7.25253 


7.2519 


7.25189 


4 


100000 


35.55 


10.46 


8.18 


7.25246 


7.2519 


7.25189 



Table 7: Computational results for pth mean criterion with K = I for some p € (—1,0) 









P 


= -0.25 






P 


= -0.75 








cpu 


obj 


cpu 


obj 


Xi 


n 


mul 


IP 


mul 


IP 


mul 


IP 


mul 


IP 


1 


10000 


7.63 


0.99 


23.3728 


23.372 


3.43 


0.83 


3635.71 


3635.29 


1 


50000 


43.86 


4.47 


23.3683 


23.3675 


25.34 


4.00 


3633.58 


3633.2 


1 


100000 


96.14 


8.47 


23.3677 


23.367 


60.04 


7.92 


3633.31 


3632.94 


2 


10000 


3.40 


0.76 


5.58855 


5.58838 


2.50 


0.83 


27.4836 


27.4811 


2 


50000 


21.13 


3.73 


5.58796 


5.58771 


11.49 


4.08 


27.4691 


27.4653 


2 


100000 


71.89 


7.03 


5.58785 


5.58763 


38.38 


7.95 


27.467 


27.4634 


3 


100 


1.61 


0.95 


6.70457 


6.70448 


1.53 


0.99 


14.1435 


14.1429 


3 


200 


14.36 


3.67 


6.68235 


6.68225 


13.45 


4.08 


13.9841 


13.9834 


3 


300 


42.54 


8.31 


6.675 


6.67491 


38.85 


9.11 


13.9318 


13.9311 


4 


10000 


1.60 


0.93 


7.25984 


7.25955 


1.30 


0.96 


52.2922 


52.286 


4 


50000 


9.24 


4.40 


7.25988 


7.25956 


5.68 


4.79 


52.2937 


52.286 


4 


100000 


30.82 


8.61 


7.25983 


7.25957 


20.84 


9.03 


52.2927 


52.2861 



Table 8: Computational results for pth mean criterion with K = I for some p < — 1 









P 


= -1.1 






P 


= -1.2 








cpu 




obj 


cpu 




obj 


Xi 


n 


mul 


IP 


mul 


IP 


mul 


IP 


mul 


IP 


1 


10000 


3.92 


0.75 


162818 


159210 


3.94 


0.67 


485415 


471459 


1 


50000 


17.17 


3.42 


162740 


159077 


17.24 


3.37 


482380 


471030 


1 


100000 


47.09 


7.55 


162732 


159060 


48.57 


7.21 


485149 


470975 


2 


10000 


3.85 


0.83 


108.922 


108.171 


3.84 


0.83 


165.133 


162.297 


2 


50000 


17.98 


4.06 


109.588 


108.072 


17.05 


4.08 


164.314 


162.133 


2 


100000 


47.35 


7.95 


109.495 


108.06 


46.04 


7.37 


165.458 


162.116 


3 


100 


37.11 


0.95 


25.9565 


25.7793 


36.67 


0.99 


31.8264 


30.8276 


3 


200 


144.83 


4.23 


25.599 


25.3307 


143.49 


4.33 


31.5254 


30.2362 


3 


300 


336.23 


9.56 


25.5115 


25.1841 


334.23 


9.54 


31.46 


30.0431 


4 


10000 


5.85 


0.98 


297.604 


277.597 


7.39 


0.92 


497.138 


453 


4 


50000 


25.46 


4.64 


297.686 


277.597 


34.57 


4.76 


497.287 


453 


4 


100000 


65.80 


9.13 


297.696 


277.597 


81.54 


9.23 


497.306 


453 



mality criteria. Our computational results show that the IP method outperforms the widely used 
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multiplicative algorithm in both speed and solution quality. The codes for this paper, including 
our implementation of the multiplicative algorithm and our codes generating inputs for SDPT3, 
are available online at www.math.sfu.ca/~zhaosong. 

We would like to remark that the performance of our IP method depends on whether the Newton 
direction can be computed accurately and efficiently. In our implementation, we observe that for 
pth mean criterion with large \p\, as well as for the design space {xi = (1, s^, sf , sf , sj)'^ , 1 < i < n} 
with n > 50000 and some random K e 5ft™ ^'^j the Newton direction cannot be computed accurately 
due to numerical errors and hence our IP method fails to terminate with a good approximate 
solution, compared with the multiplicative algorithm. 

Finally, we also compare the performance of our IP method with KNITRO [TDI (Version 7.0) 
that is a general IP solver for solving smooth convex and nonconvex optimization problems. In 
particular, we call KNITRO from its Matlab interface with algorithm option interior to solve 
problem ([T]) for pth mean criterion with K = I. We also provide this solver with the subroutines 
that efficiently evaluate function, gradient and Hessian- vector multiplication. Furthermore, we use 
the default tolerance of the solver. We observe that our IP method is at least 10 times faster and 
produces a smaller objective value for all instances described in Tables [7] and [H 
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